I sequenced the complete insert of the pDNA library of pMT02. I already extracted all sequences in front of the 3’ adapter from the sequences data and added counts to identical sequences by starcode. I now want to make an overview about how many pDNA insert sequences in the pDNA still match the designed inserts.
How to make a good rendering table:
| column1 | column2 | column3 |
|---|---|---|
| 1 | 2 | 3 |
| a | b | c |
Functions used thoughout this script.
## Parsed with column specification:
## cols(
## sequence = col_character(),
## number = col_double()
## )
## Users can try to set parallel.cores = -1 to use all cores!
## Processing... 2021-02-25 15:12:04
## Calculating GC content...
##
## Completed. 2021-02-25 15:12:09
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
## [1] "4% progress"
## [1] "8% progress"
## [1] "9% progress"
## [1] "12% progress"
## [1] "13% progress"
## [1] "16% progress"
## [1] "17% progress"
## [1] "21% progress"
## [1] "25% progress"
## [1] "29% progress"
## [1] "33% progress"
## [1] "34% progress"
## [1] "37% progress"
## [1] "38% progress"
## [1] "41% progress"
## [1] "42% progress"
## [1] "46% progress"
## [1] "50% progress"
## [1] "54% progress"
## [1] "58% progress"
## [1] "59% progress"
## [1] "62% progress"
## [1] "63% progress"
## [1] "66% progress"
## [1] "67% progress"
## [1] "71% progress"
## [1] "75% progress"
## [1] "79% progress"
## [1] "83% progress"
## [1] "84% progress"
## [1] "87% progress"
## [1] "88% progress"
## [1] "91% progress"
## [1] "92% progress"
## [1] "96% progress"
## [1] "100% progress"
Clearly wrongly assigned barcodes can be assigned to the correct insert Barcodes that are attached to a mixed population of inserts should to be excluded from any analysis where this plasmid library was used
# # Export barcodes that are attached to multiple inserts
# bc_exclude <- matching_df_exclude$barcode %>% unique()
# write.csv(bc_exclude, "/DATA/usr/m.trauernicht/projects/SuRE-TF/data/pDNA_insert_seq/bc_exclude.csv")
#
# # Export barcodes that are attached to the wrong insert
# bc_replace <- pDNA_seq_incorrect %>% dplyr::select(barcode, `bc-match`, `insert-match`) %>% unique()
# write.csv(bc_replace, "/DATA/usr/m.trauernicht/projects/SuRE-TF/data/pDNA_insert_seq/bc_replace.csv")paste("Run time: ",format(Sys.time()-StartTime))## [1] "Run time: 11.72874 mins"
getwd()## [1] "/DATA/usr/m.trauernicht/projects/SuRE-TF/pDNA_insert_seq"
date()## [1] "Thu Feb 25 15:23:16 2021"
sessionInfo()## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.7 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] tibble_3.0.1 plotly_4.9.2.1
## [3] LncFinder_1.1.4 sunburstR_2.1.4
## [5] d3r_0.9.0 vwr_0.3.0
## [7] latticeExtra_0.6-29 lattice_0.20-38
## [9] stringdist_0.9.5.5 ggbeeswarm_0.6.0
## [11] ggplot2_3.3.0 dplyr_0.8.5
## [13] readr_1.3.1 tidyr_1.0.0
## [15] phylotools_0.2.2 ape_5.4-1
## [17] maditr_0.6.3 plyr_1.8.6
## [19] ShortRead_1.42.0 GenomicAlignments_1.20.1
## [21] SummarizedExperiment_1.14.1 DelayedArray_0.10.0
## [23] matrixStats_0.55.0 Biobase_2.44.0
## [25] Rsamtools_2.0.3 GenomicRanges_1.36.1
## [27] GenomeInfoDb_1.20.0 Biostrings_2.52.0
## [29] XVector_0.24.0 IRanges_2.18.3
## [31] S4Vectors_0.22.1 BiocParallel_1.18.1
## [33] BiocGenerics_0.30.0 seqinr_3.6-1
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.4-1 hwriter_1.3.2 ellipsis_0.3.0
## [4] class_7.3-15 farver_2.0.1 prodlim_2019.11.13
## [7] lubridate_1.7.4 codetools_0.2-16 splines_3.6.3
## [10] knitr_1.30 ade4_1.7-13 jsonlite_1.7.1
## [13] pROC_1.16.1 caret_6.0-85 png_0.1-7
## [16] shiny_1.4.0 compiler_3.6.3 httr_1.4.1
## [19] fastmap_1.0.1 assertthat_0.2.1 Matrix_1.2-18
## [22] lazyeval_0.2.2 later_1.1.0.1 htmltools_0.5.0
## [25] tools_3.6.3 gtable_0.3.0 glue_1.4.2
## [28] GenomeInfoDbData_1.2.1 reshape2_1.4.4 Rcpp_1.0.5
## [31] vctrs_0.2.4 nlme_3.1-143 iterators_1.0.12
## [34] crosstalk_1.0.0 timeDate_3043.102 gower_0.2.1
## [37] xfun_0.19 stringr_1.4.0 mime_0.9
## [40] lifecycle_0.2.0 zlibbioc_1.30.0 MASS_7.3-51.5
## [43] scales_1.1.0 ipred_0.9-9 promises_1.1.1
## [46] hms_0.5.3 RColorBrewer_1.1-2 yaml_2.2.1
## [49] rpart_4.1-15 stringi_1.5.3 foreach_1.4.7
## [52] e1071_1.7-4 lava_1.6.6 rlang_0.4.8
## [55] pkgconfig_2.0.3 bitops_1.0-6 evaluate_0.14
## [58] purrr_0.3.3 labeling_0.3 recipes_0.1.9
## [61] htmlwidgets_1.5.2 tidyselect_1.1.0 magrittr_1.5
## [64] R6_2.5.0 generics_0.0.2 pillar_1.4.3
## [67] withr_2.1.2 survival_3.1-8 RCurl_1.95-4.12
## [70] nnet_7.3-12 crayon_1.3.4 rmarkdown_2.5
## [73] jpeg_0.1-8.1 grid_3.6.3 data.table_1.12.8
## [76] ModelMetrics_1.2.2.1 digest_0.6.27 xtable_1.8-4
## [79] httpuv_1.5.4 munsell_0.5.0 beeswarm_0.2.3
## [82] viridisLite_0.3.0 vipor_0.4.5